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ABSTRACT 

In  order  to  bo  useful,  an  approximate  solution  y of  a nonlinear  system  of 

n * 

equations  f (x)  =0  in  R must  be  close  to  a solution  x of  the  system.  Two 

* 

theorems  which  can  be  used  ccmputat ional ly  to  establish  the  existence  of  x and 

* 

obtain  bounds  for  the  error  vector  y - x are  the  1948  result  of  I..  V.  Kantorovich 

and  the  1977  interval  analytic  theorem  due  to  R.  E.  Moore.  The  two  theorems  are 

* 

compared  on  the  basis  of  sensitivity  (ability  to  detect  a solution  x close  to 
y) , precision  (ability  to  qive  sharp  error  bounds),  and  computational  complexity 
(cost).  A theoretical  comparison  shows  that  the  Kantorovich  theorem  has  at  best 
only  a slight  edge  in  sensitivity  and  precision,  while  Moore's  theorem  requires 
far  less  computation  to  apply,  and  thus  provides  the  method  of  choice.  This 
conclusion  is  supported  by  a numerical  example,  for  which  available  ITNIVAC  1108. 

1 1 1 0 software  is  vised  to  check  the  hypotheses  of  both  theorans  automatically, 
given  y and  f . 
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Significance  and  Explanation 


This  report  shows  a way  to  save  money  in  the  evaluation  of  accuracy  of  approxi- 
mate  solutions  of  finite  systems  of  nonlinear  algebraic  and  trancendental  equations, 
based  on  the  comparison  of  an  older  method  for  this  purpose  with  a newer  one.  It  is 
shown  that  the  two  methods  are  theoretically  about  equally  effective,  while  the  newer 
one  is  cheaper  computationally.  This  conclusion  has  been  verified  in  practice,  using 
operational  UNIVAC  1108/1110  software.*- 

In  the  case  of  finite  linear  systems  of  equations,  there  is  a simple  theory 
based  on  a finite  number  of  arithmetic  operations.  For  linear  systems  of  moderate 
size,  the  successful  computation  of  a vector  y=  (y^ ,y2 » • • • »Yn)  can  imply  that  it 
is  close  to  the  actual  solution  x*  = (x*  ,x*  , . . . ,x*) . For  nonlinear  systems,  the 
situation  is  less  well  understood  in  both  theory  and  practice,  although  many  numerical 
methods  are  known  which  will  furnish  approximate  solutions  y . The  concern  here  is 
not  with  how  y has  been  found,  but  rather  with  the  determination  of  its  accuracy  as 
an  approximation  to  an  exact  solution  x*. 


Often,  x*  is  defined  as  the  limit  of  an  infinite  process,  so  the  computed  value 
y will  be  contaminated  by  both  truncation  and  roundoff  error.  For  y to  be  useful, 
one  must  be  able  to  assert  that  it  is  satisfactorily  close  to  x*.  In  order  of  in- 
creasing expense,  such  assertions  can  be  based  on  simple  (but  perhaps  misguided)  faith 
in  the  output  of  a computer,  usually  reliable  but  not  truly  rigorous  heuristic  obser- 
vations, or  the  use  of  theorems  which  give  guaranteed  results.  The  latter  degree  of 
certainty  may  be  required  in  certain  applications,  in  addition  to  its  intellectual 
necessity,  and  involves  proving  the  existence  of  x*  as  well  as  obtaining  error 
bounds  for  the  vector  y - x*. 


Certain  criteria  are  established  for  theorems  which  prove  existence  of  solution 
vectors  and  give  error  bounds.  First,  they  must  be  computational , so  that  a computer 
program  can  be  written  to  check  out  the  hypot'eses  of  the  theorem,  given  y and  the 
system  of  equations.  Thus,  the  proof  of  existence  and  error  bounds  will  be  provided 
automatically,  or  an  indication  of  the  reason  for  failure  of  the  theorem  to  hold  will 
be  given.  A second  desirable  property  is  sensitivity ; if  y is  close  to  a solution, 
then  the  theorem  should  detect  the  existence  of  x*  rather  than  give  a negative  re- 
sult. Also  wanted  is  precision , so  that  the  bounds  for  y - x*  will  be  as  sharp  as 
possible,  to  help  avoid  wasting  money  by  refining  an  already  sufficiently  accurate 
approximation.  Finally,  the  theorem  should  be  as  simple  (cheap)  to  implement  as  pos- 
sible.  The  computation  of  y might  be  expensive,  and  as  little  as  necessary  should 
be  added  to  this. 

A metric  existence  theorem  suitable  for  automation  was  published  by  I,.  V . 
Kantorovich  in  1948.  At  MRC,  software  for  implementation  of  this  theorem  was  announced 
in  1967,  and  an  improved  version  was  released  in  1972.  In  1977,  R.  E.  Moore  published 
an  interval  analytic  existence  theorem  to  which  this  software  was  readily  adaptable, 
as  automatic  differentiation  and  interval  arithmetic  were  already  implemented  in  it. 

A theoretical  ccmparison  of  these  theorems  of  different  type  is  achieved  by  recasting 
Moor  's  theorem  as  a metric  theorem.  It  turns  out  that  the  Kantorovich  theorem  has 
only  a slight  edge  in  sensitivity  and  precision  over  this  restricted  version  of  the 
Moore  theorem,  but  the  latter  is  computationally  much  simpler  and  hence  preferable. 

A numerical  example  is  given  to  support  this  conclusion.  This  example  arose  in 
an  unrelated  investigation  of  numerical  solution  of  nonlinear  integral  equations  of 
radiative  transfer  type,  and  was  not  constructed  for  this  purpose.  The  use  of  Moore's 
theorem  qave  a 4 to  1 advantage  in  speed,  and  detected  the  existence  of  a solution 
which  tile  supposedly  more  sensitive  Kantorovich  theorem  failed  to  do. 

The  responsibility  for  the  wording  and  views  expressed  in  ttiis  descriptive  summary 
lies  with  MRC,  and  not  with  the  author  of  this  rof>ort . 
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ABSTRACT . In  order  to  be  useful,  an  approximate  solution  y of  a nonlinear 

n * 

system  of  equations  f (x)  = 0 in  R must  be  close  to  a solution  x of  the 

system.  Two  theorems  which  can  be  used  computationally  to  establish  the  exist- 
* * 
ence  of  x and  obtain  bounds  for  the  error  vector  y - x are  the  1948  result 

of  L.  V.  Kantorovich  and  the  1977  interval  analytic  theorem  due  to  R.  E.  Moore. 

The  two  theorems  are  compared  on  the  basis  of  sensitivity  (ability  to  detect  a 

* 

solution  x close  to  y) , precision  (ability  to  give  sharp  error  bounds) , and 
computational  complexity  (cost) . A theoretical  comparison  shows  that  the 
Kantorovich  theorem  has  at  best  only  a slight  edge  in  sensitivity  and  precision, 
while  Moore's  theorem  requires  far  less  computation  to  apply,  and  thus  provides 
the  method  of  choice.  This  conclusion  is  supported  by  a numerical  example,  for 
which  available  UNIVAC  1108/1110  software  is  used  to  check  the  hypotheses  of 
both  theorems  automatically,  given  y and  f . 

1.  Nonlinear  Systems  of  Equations.  A system  of  n nonlinear  algebraic  or 


transcendental 

equations 

in  n 

real 

unknowns , 

C (x. 

,x„ , 

— ,x  ) = 0 

1 1 

2 

n 

(1.1) 

1 

f2(Xl 

, X2  r 

• * • f x ) = 0 

n 

s.  fn<Xl 

,x2. 

— ,x  ) = 0 
n 

may  be  represented  concisely  in  the  real  n-dimensional  vector  space  RU  as  the 
equation 

(1.2)  f(x)  = 0 , 

where  f : D c r°  ->  Rn  is  a nonlinear  operator  from  a domain  D c Rn  into  Rn. 
The  problem  of  solving  the  system  (1.1)  or  the  equivalent  equation  (1.2)  is,  of 
course,  to  find  a iolnuCion  x = (x^ ,x2 , . . . ,Xr)  e D which  f maps  into  the 
origin  0 = (0,0,... ,0)  of  Rn  . 

In  order  to  keep  the  discussion  of  this  problem  within  the  realm  of  prac- 
tical computation,  it  will  be  assumed  that  the  functions  f (x^ ,x  , . . . jX^) , 
i = l,2,...,n,  comprising  the  components  of  f(x)  can  be  written  as  formulas  in 
FORTRAN  or  some  similar  computer  language.  As  software  exists  for  analytic 
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differentiation  of  functions  of  this  type  [4,9]  , it  will  also  be  assumed  without 


further  ado  that  the  derivatives  f* 
•> 


3f^/3Xj  and,  if  necessary,  f£ 


3 f./3x.3x  can  be  computed  automatically,  so  that,  in  particular,  the  evaluation 
1 J K 

of  the  Jacobian  matrix 


f ' (x) 


presents  no  difficulty  for  x e D . 


3f.(x) 


2.  Existence  theorems  and  error  bounds.  Actual  computation  with  one  of  the  many 

methods  for  solving  systems  of  equations  will  yield  an  approximate  Aoiution  y , 

• 

which  will  be  useful  if  it  can  be  asserted  that  y is  "close"  to  a solution  x 

* 

of  (1.2).  This  involves  establishing  (i)  the  Cxiite/lCC  of  x in  seme  region 
• * 

Si  containing  y , and  (ii)  seme  type  of  bound  for  the  eAAOA  vectoA  c = x - y 

or  its  components. 

In  the  case  of  linear  systems  Ax  = b,  verification  of  existence  rests 
on  a finite  number  of  aritimetic  operations.  For  linear  systems  of  reasonable 
size,  errors  in  computed  approximate  solutions  result  only  from  rounding,  and 

successful  execution  of  a carefully  written  program  can  insure  the  existence  of 

* 

x and  provide  usable  error  estimates  [2].  However,  in  the  nonlinear  case,  such 

* 

assurance  of  existence  may  be  lacking.  Furthermore,  even  if  known  to  exist,  x 

may  be  defined  only  as  the  limit  of  an  infinite  process,  in  which  case  y will 

* 

differ  from  x due  to  truncation  as  well  as  roundoff  error. 

* 

Some  common  criteria  for  accepting  y as  a good  approximation  to  x are 
(i)  the  AC-iidual 

(2.1)  r = f (y) 
is  small,  or  (ii)  the  COArcction 

(2.2)  6.  = x(k+1)  - x(k) 

k 

* (k) 

is  small  when  calculating  x as  the  limit  of  a sequence  (x  } , in  which  case 

one  takes  y = X^+D.  Either  of  these  can  fail  for  nonlinear  f in  R . For 


example,  if  f(x) 


, then  the  residual  r can  be  made  arbitrarily  small,  but 

. 1 . , rt  ni _•  , 1 ..  _ ■ . . ■ ( k+  1 ) 


x does  not  exist  such  that  f (x  ) = 0 . Similarly,  an  iteration  process  x = 

(k)  * 

4>(x  ) can  stall  far  from  x . To  see  this,  apply  Newton's  method  to  (1.2)  with 

f (x)  = xm  and  x^  = 1.  The  correction  6 can  be  made  as  small  as  desired  by 

° (1) 

taking  m sufficiently  large;  the  value  obtained  for  y = x will  be  close  to 

* 

1 and  thus  not  a good  approximation  to  x = 0 . 

It  follows  that  there  is  a need  for  computational  existence  theorems  such 
that  given  y and  f , their  hypotheses  can  be  checked  by  a computer  program, 
and  error  bounds  obtained  if  existence  is  verified.  Other  desirable  properties 


-2- 


for  such  theorems  are : 

(i)  Sensitivity.  The  region  S2  in  which  the  theorem  can  detect  t lie 

existence  of  a solut ion  x of  equation  (1.2)  is  as  larqe  as  possible. 

• * 

(ii)  Precision.  The  region  i)  in  which  x is  guaranteed  to  exist 

does  not  extend  far  beyond  x , so  that  the  error  bounds  obtained  are  as  good 
.as  possible. 

(iii)  Simplicity.  The  additional  computation  required  to  guarantee  exist- 
ence and  obtain  error  bounds  should  bo  as  inexpensive  as  possible. 

To  a certain  extent,  sensitivity  and  precision  are  incompatible,  at  least  in 
a single  theorem.  A sensitive  theorem  might  establish  existence  of  x in  a 
large  region,  but  not  yield  usable  error  bounds.  A result  requiring  highly  pre- 
cise approximate  solvit  ions,  on  the  other  hand,  might  fail  to  detect  a solution 
x which  is  close  enough  to  make  the  accuracy  of  y satisfactory  for  the  in- 
tended purpose.  Consequently,  some  compromise  between  sensitivity  and  precision 
must  be  struck.  Of  course,  a computational  strategy  can  be  devised  in  which  a 
sensitive  theorem  is  u s «\i  to  scout  a large  region  for  suitable  initial  approxima- 
tions, which  are  then  refined  until  a precise  theorem  guarantees  sufficient 
accuracy  (13]. 

The  purpose  of  this  paper  is  to  compare  two  computational  existence  theorems, 
for  which  UNIVAC  1108/1110  software  is  operat ional , on  the  basis  of  sensitivity, 
precision,  and  simplicity. 


'•  ''he  theorems  of  Kantorovich  and  Moore.  Theorems  which  can  be  implements!  com- 
putationally to  obtain  verifiable  conditions  for  existence  of  solut ions  as  well  as 
error  bounds  include  the  well-known  result  on  the  convergence  of  Newton's  method 
due  t o I..  V.  Kantorovich  ((•>]  and  t lu'  more  recent  interval  -analyt  ie  theorem  of 
K.  K.  Moore  (12].  Brief  statements  of  these  theorems  will  now  be  given;  proof s 
may  be  found  in  the  literature.  In  particular,  a neat  proof  of  the  form  of  the 
Kantorovich  theorem  presented  here  may  bo  found  in  five  mite  by  Ortega  (141. 

I.et  II  *11  denote  a norm  for  k"  and  also  a consistent  matrix  norm  foi  um\ 
real  matrices;  that  is,  II  Axil  II  All  *11x11  for  all  x < k"  and  square  matrices  A 
of  order  n with  real  elements.  The  ingredients  of  the  Kantorovich  t heorixm  are 
(i)  an  initiat  petit  t (approximate  solution)  x^  at  which  t he  Jacobian  matiix 
f'(x^1')  is  invertible,  with 

(3.1)  II  (f  ’ (X  <0>  ) ] “l)l  ^ B0  , 

and  (ii)  a Lipschits  constant  *.  for  f'  such  that 


( »..’) 


Ilf'(u)  - f • <v)  II  • kIIu-vII  , u,v  < iJ  , 


-1- 


(0) 


where  Q is  a sufficiently  large  region  containing  x . (The  meaning  of 
"sufficiently  large"  will  be  made  precise  below.)  From  (i)  , the  NfcurfOH  po-int 


x(1)  - x(0)  - [f ,(x(0))f1f <x(0)) 


(3.3) 

is  uniquely  defined,  and  one  can  find  a constant 

(3.4) 

Theorem  3.1  (Kantorovich) . If 

(3.5) 


nQ  such  that 


„ (1)  (0)..  _ 

II  x - x II  <_  nQ 


h.  » B.  tc  < — , 
0 0 0 —2 


and  G(x(0) ,pQ)  - {x 


Ix-x 


(0), 


^ PQ)  C n for 


1 - / l-2h„ 


(3.6) 


then  there  exists  a solution  x t U(x 


0 

(0) 


^0  ' 


,pQ)  of  equation  (1.2). 


The  conclusion  of  this  theorem  provides  a guarantee  of  the  existence  of  a 

* 

solution  x and  also  the  error  bound 


(3.7) 


i * (0), 

Ix-x  I 


^P0 


for  x 

(1) 


(0) 


x instead  of  x 

error  bound 


as  an  approximate  solution  of  equation  (1.2).  Using  the  Newton  point 
^ as  the  approximate  solution,  one  can  obtain  the  sharper 


(3.8) 


i * <1>, 

Ix-x  I 


i-v 

- — r "o 


as 


as  shown  by  Gragg  and  Tapia  [3).  In  practice,  one  may  take  fl  - U(x^°  ,2ryj) 

if  »c  is  the  Lipschitz  constant  for  this  region,  then  U(x^  ,pg)  c u(x^  ,2nQ) 

if  and  only  if  h„  < zr  • 

0 — 2 


Moore's  theorem  is  based  on  the  concepts  of  interval  analysis  (11).  Given 

n 


vectors  a - (a,, a.,..., a ) and  b 
12  n 


(bl'b2 bn} 


i - l,2,...,n,  the  intCtoat  X - [a,b] 
(3.9)  X - [a ,b] 


in  R1 


in  R 
n 

n is  defined  to  be 


with  a.  < b.  , 
l — l 


(x:a^  <_  xA  i “ 1,2,.  ..,n). 


The  Ot ((Aval  extension  G of  a continuous  function  g:D  c Rn  Rn  may  be  con- 


structed on  intervals  X c d in  the  following  way:  For  g(x) 


(gx(x)  , g (x) , 


g (x) ) , take 
n 


(3.10) 


min  g.(u)  , dA 
u ( x 


max  gi (v) , i - 1,2, 


v < x 


and  define  G(X)  - (c,dl  . It  follows  that  G(X)  a lg(x):X(X)  - g(X);  however, 

G (X)  may  also  contain  vectors  which  are  not  of  the  form  g(x)  for  seme  x t X . 


-4- 


The  recipe  for  Moore's  theorem  [12]  calls  for  an  initial  point  y , an 
interval  X containing  y , and  a nonsingular  real  matrix  Y . These  are  used 
to  form  the  K'lOWCZi/k  t-ian&^ofvmaX-ion 

(3.11)  K (X)  = y - Yf  (y)  + { I-YF ' (X)  } (X-y) 

of  the  interval  X , where  I is  the  identity  matrix,  and  F'  denotes  the  interval 
extension  of  the  derivative  f'  of  f . 

Theorem  3.2  (Moore).  If 

(3.12)  K(x)  c X , 

* 

then  there  exists  a solution  x e K(X)  of  equation  (1.2). 

This  theorem  also  provides  error  bounds  in  addition  to  a guarantee  oi  the 
existence  of  a solution.  Setting  K(X)  = [c,d] , one  obtains  the  componentwise 
error  bounds 

(3.13)  lxi~yiJ  — max  ^ I y i~ci I * ldi~yi 1^  • * = 1 > 2 » — ,n, 

for  y as  an  approximate  solution  of  equation  (1.2).  Error  bounds  analogues  to 
(3.8)  may  be  obtained  by  setting 

(3.14)  w = y-Yf  (y)  , [s,t]  = {I-YF ' (X)  } (X-y)  , 

from  which 


x . -w.  < t ,-s . , 

' 1 1 ' — 1 1 


i — 1,2, ... ,n. 


(3.15) 
as  (x-w)  t [s,t]  . 

Although  different  in  appearance,  Theorems  3.1  and  3.2  have  a common 
Define  the  Newton  Op&UltOA.  $ by 


(3.16) 

and  the  Me icton  4equence  tx 

(3.17) 


4>(x) 


(k) 


• X - [f'(x)l 
by 


X(k+1)  = *(x(k)), 


f (x) 


k = 0,1,2,. 


Theorem  3.1  gives  sufficient  conditions  for  the  convergence  of  the  Nov.' 
(0)  * 

starting  from  x to  a solution  x of  equation  (1.2).  The  Krawc: \ 

K defined  by  (3.11)  stems  from  an  adaptation  of  the  Newton  op>erator  (i. 
the  iteration  process  (3.17)  to  interval  computation  [8].  Under  the  hy> 
of  Theorem  3.2,  the  operator  ip  defined  by 


kground . 


(3.18) 

will  have  a fixed  point 
by  the  invert ibility  of 


i['(x)  = x - Y f (x) 


x e K (X) , which  is  also  a solution  of  equal  ion 
Y [12]. 


juence 

rator 

and 

ses 


l .2) 


Moore's  theorem  in  R 


Because  of  the  different  character  of  Theorans  3.1 


and  3.2,  it  will  be  necessary  to  make  some  special  assumptions  in  order  to  compare 
them.  As  the  metric  topology  for  intervals  [11,  pp.  15-24]  is  closely  related  to 
the  norm 


(4.1) 


for  Rn  , the  method  of  comparison  will  be  to  reformulate  Moore's  theorem  as  a (less 
general)  metric  theorem,  which  will  then  be  compared  to  the  Kantorovich  theorem  in 
. The  following  concepts  will  be  needed. 

The  closet!  ball  U(y,p)  in  Rn  with  center  y and  radius  p is  a special 

CO 

type  of  interval,  namely 

(4.2)  U(y,p)  = { x : II x— yll  ^ p)  = [y-pe,y+pe]  , 

where  e * (1,1,..., 1).  In  particular,  the  clo&cd  unit  ball  U(0,1)  in  R*^  is 
simply  the  interval  [-e,e] . The  magnitude  of  the  scalar  interval  [a, 6]  in  R 
is  defined  to  be 

(4.3)  | [a, 8]  | = maxi  | ci | , |p|  } . 

Similarly,  for  an  interval  X = [a,b]  in  Rn  , 

(4.4)  jx|  = max  {max  (|a.|,|b.|)}  = max  {II  all  , II  loll  } = max  llxll 

(i)  1 1 xt  X 

in  terms  of  the  norm  (4.1).  The  uxidth  of  X is 

(4.5)  w(X)  = max  {(b.-a.l)  * Itb-atl  , 

(i)  1 1 

and  the  midpoint  of  X is  m(X)  * ^-(a+b) . The  matrix  norm  corresponding  to  (4.1) 


max  { |x . | } 
(i)  1 


for  matrices  A 
(4.6) 


(a . . ) is 
13 


II  All 


n 

max  J 


a.  . 

ID 


(i)  j=l 

It  follows  from  this  and  (4.4)  that 

(4.7)  II All  = |A(-e,e]  | . 

For  a matrix  M = (ta^rB^jl)  with  interval  components,  one  has 

n 

(4.8)  | M I = max  J | [a.  . ,8.  .1  | = max  { II All  > = |M[-e,e]  | . 

(i)  j = l 13  13  AtM 

In  Theorem  3.2,  suppose  that  B >_  II Yll  , n >_  lly-wll  , where  w * y - Y f (y)  , and 

(4.9)  X = [y  - pe,  y + pe]  *=  U(y,p). 


The  Krawczyk  transformation  of  X is  thus 

(4.10)  K (X  ) 

p 

Also,  suppose  that 

w(p) 


w + pll  - YF’ (X  ) > [-e,e]  . 

P 


(4.11) 


Y-1  - F' (X  ) 

P 
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I 


Lemma  4.1. 


If  Bu(  p)  <1  , then  K(X  ) c X for 

P P 


(4.12) 


- 1 


Bw(p) 


Proof:  As  I , Y are  real  matrices,  one  may  write 


(4.13) 


{i-yf'(x)}  = y{y_1-f'(x)}  . 


For  v e p{l-YF*  (X) } [-e,e] , (w+v)  e K(X  ) , and 

P 

(4.14)  Ity  - (w+v)  II  lly-wll  + llvll  <_  n + Bpu(p)  , 

so  that  II y - (w+v)  II  <_  p if  (4.12)  holds,  and  thus  (w+v)  e X^  . QED. 

An  interval  of  the  form  [-a, a]  is  said  to  be  Aymme£AA.C..  As  a linear 

transformation  of  a symmetric  interval  is  symmetric,  one  has 

(4.15)  {I  - YF ' (X  )}[-e,e)  = [-c,c], 

P 

and  thus  K(X  ) = [w-pc,  w+pc] . The  next  step  in  the  comparison  of  Theorems  3.1 
P 

and  3.2  will  be  to  use  the  Lipschitz  continuity  of  f’  to  obtain  a scalar  majorant 
function  for  u(p) . If  f'  is  Lipschitz  continuous  on  an  interval  X (with 
Lipschitz  constant  tc  ) , then  each  component  f|  of  f'  is  also  Lipschitz 

continuous  on  X , that  is 

(4.16)  |f!.(u)  - f!.(v)|  < A . . II  u— v||  , u,v  e X 

13  13  1 ~ 13 


Define 

(4.17) 


X = max  V X . . 

t-*  in 


13 


(i)  j=l 

It  follows  that  X will  be  a Lipschitz  constant  for 
assume  k <_  X . 

Lemma  4.2.  If  y e X , then 
(4.18)  | f ' (y)  - F ’ (X)  | <_  X |y-X  | . 

Proof:  By  (3.10),  (4.4),  and  (4.16), 


on  X , so  that  one  may 


(4.19) 


f!  . (y)  - FJ  . (X)  I < X.  . max  {||y-x||  } = X.  . |y-x|  , 


13 


13 


- 13 


x e X 


13 


i,j  = l,2,...,n.  Inequality  (4.18)  now  follows  from  (4.8),  (4.16),  and  (4.17).  OED. 

The  choice  Y = [f'(y)]  ^ now  gives  a metric  theorem  which  allows  direct  com- 
parison of  Theorems  3.1  and  3.2. 

Theorem  4.1.  If 


(4.20) 


then  K (X  ) c x 
P P 


h = BX  n ^ , 


for  p such  that 


(4.21) 


and  X c x 

P 


1 - /l^4h 
2h 


< P 


1 + /l-4h 
2h 


p , and 


Proof : As  y - m(X  ) is  the  midpoint  of  X , y-X  ■ — w(X  ) » 

p p 1 p 1 2 p 


one  may  take 
(4.22) 


w ( p ) “ X p 


by  (4.11)  and  (4.18).  Thus,  if  h <_  — , then  inequality  (4.12)  may  be  solved  to 


give  (4.21),  and  the  conclusion  follows  from  i,enma  4.1. 


QED. 


In  order  to  use  Theorem  4.1  to  compare  the  theorems  of  Kantorovich  and  Moore, 


(0) 


it  is  natural  to  take  y - x , Y - Yp  , where 


(4.23) 


(1) 


YQ  - [f(x(0))]-1 


so  that  w = x ,B“Bo'n*nO‘  Thus*  ^or 


(4.24) 
one  has 


,<0) 


[x^0)  - 2n0e,x^°^  + 2h0e]  « 0 , 


(4.25) 


K(X<0)) 


x(1)  + 2n0{l-Y0F’ (X(0)) }[-e,e]  . 


Corollary  4.1.  Under  the  hypotheses  of  Theorem  3.1,  if 


(4.26) 


0 - 4X  ' 


MX<0’>  c x«» 


Proof:  This  follows  immediately  from  Theorem  4.1,  as  hQ  = ( X/tc)  h . QED. 


Remark  4.1.  As  (4.26)  requires  that  h^  — even  if  k = X , Corollary  4.1 


provides  a comparison  of  Theorem  3.1  with  Theorem  3.2  which  is  unfavorable  to  the 
latter.  Furthermore,  this  cannot  be  improved  in  general,  as  the  following  example 

1.  This  gives 


shows.  In  R , take  f (x)  - x*  - c“  for  0 < r < 1 and  x ^ 


(4.27) 


(1) 


12  12 

jU  + e ) . rip  - jd  - r ) , 


from  which  X^  « [e2,2  - f2]  . As  ®q  ” T an<*  k ■ X * 2,  it  follows  that 


(4.28) 


0 


h = |(1  - c2) , 


and  Theorem  3.1  guarantees  the  existence  of  the  solution  x * e of  f (x)  =0  in 
X(0)  for  0 <_  e <_  1.  On  the  other  hand,  direct  computation  with  (4.25)  yields 


(4.29) 


K(x(0))  - [-i+|c2 


4 3 3 2 4, 

e , i * r + c ]' 


and  it  is  easy  to  verify  that  K(X^°S  c x^°*  if  and  only  if  ^ <_  e2  1 , that  is, 

0 1 ho  ; 7 • 


A result  more  favorable  to  Moore's  theorem  may  be  obtained  by  making  different 


choices  of  y,  Y,  and  X than  those  above.  If  the  hypotheses  of  Theorem  3.1  are 

(k) 


satisfied,  then  the  Newton  sequence  {x  ) generated  by  (3.17)  is  well  defined, 
as  are  the  sequences  of  real  numbers  {r^J,  (B^)#  (h^ } satisfying  the  relationships 
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1 


i 


(4.31) 


v(k)  . (k) 

X = (x  - 2 n e 
k 


(k)  _ , 

x + 2nke] 


x 


(0) 


(4.32) 


Theorem  4.2.  If  the  hypotheses  of  Theorem  3.1  hold  with 

1 


0 


(k) 


then  there  exists  a positive  integer  k such  that  for  y = x , V 
(f ' (x  ))  , one  lias  h(.\  ) «■  X 

Proof:  If  (4.32)  holds,  then  the  numbers  h^  decrease  rapidly  with  k 

In  fact,  the  recurrence  relations  (4.30)  can  be  solved  [3]  to  give 

„k 


2 P 


(4.33) 

where 

(4.34) 


0 


1+0 

1 0 


,k  32 


k = 0,1,2 


1 - » l-2h. 
°0  ~ 1 + »’l^2h 


0 


and  0 < 1 if  (4.32)  holds.  Thus,  a positive  integer  k will  exist 


\-4l 


(4.35) 

and  the  conclusion  will  follow  from  Theorem  4 . 1 , as  h = (\/k)h 


1 


k - 4 ' 

Remark  4.2.  The  condition  (4.32)  implies  quadratic  convergence 

i ..(0) 


sequence  t x 
of  f in  the  sense  that  [f'(x  )] 


to  x ; furthermore,  x will  be  unique  in  X and 
* . -1 


will  exist  (161.  The  case  h 


0 


hat 


yyp. 

Newt  on 
zero 

■ a 


borderline  situation,  which  corresponds  to  linear  convergence  of  the  Nov  on  sequence 
with  ratio  -q  (6).  Consequently,  if  rapid  convergence  of  the  Newton  seq'+r.  e to 
the  approximate  solution  y has  been  observed  computationally,  then  it  likely 


that  the  value  of  h^  corresponding  to  x^ 
that  either  Theorem  3.1  or  3.2  is  applicable. 


= y satisfies  (4.26)  immediately 
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5.  Precision  and  sensitivity  to  simple  zeros.  On  the  assumption  that  h^ 
satisfies  (4.26),  it  follows  from  Thereom  3.1  thct  for  y = x^  , 


1 - /l-2h 


ly  - x II  < 


n0  P0  ' 


and,  similarly.  Theorem  4.1  gives,  by  (4.20)  , 


ly  - x II  < 


1 - /1-4 ( A/k)  h£ 
2 ( A/k)  h ~ 


n0  = P 


If  0 < hQ  £ </4A,  then  it  is  evident  that 

(5.3)  P0  < P . 

so  that  the  Kantorovich  theorem  is  of  greater  precision  than  the  metric  version 
of  Moore 1 s theorem . However , 

p0 

(5.4)  lim  — = 1 , 

v°  p 

so  that  the  difference  may  be  inconsequential  for  h^  sufficiently  small. 

To  compare  sensitivity,  suppose  that  x is  a simple  zero  of  f , and 

(5.5)  B*  > II  [f’(x*)]_1|l. 

* 

Given  Lipschitz  constants  k,A  in  a sufficiently  large  region  fi  containing  x , 
one  may  take 

1 - \ B kIIx(0)-x  II  * (Q)  * 

(5-6)  h°=  n~  r*  „ (0)  *„ . 2 B Kllx  -xl1 

(1-B  kIIx  -x  II) 

( o)  ★ * 

in  Theorem  3.1,  provided  that  ||x  -x  II  < 1/B  ic  [16]  . From  (5.6)  , 


* « £ — 


= 0(V  ' 


which  may  be  used  as  a measure  of  sensitivity.  From  Theorem  3.1,  x will  be 
detected  if 


(5.8)  ||x'w  - x II  < o(j)  , 

(0)  - * 1 

that  is,  if  x e U(x  ,o(  — )),  while  Corollary  4.1  requires  that 


i <0)  *, 

lx  - x II 


i ° (ZT  ) * 


Thus,  as 


(5.10) 


■ °-6265--  • 

o(  2} 
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tht>  best  radius  of  detection  of  x that  ran  be  guaranteed  foj  the  metric  version 
of  Moore's  theorem  is  about  62.5*  of  the  corresjiondinq  value  for  t lie  Kantorovich 


theorem.  The  example  of  Remark  4.1  can  also  be  used  to  show  that  this  cannot  be 

;(P)  - x*.l  < o(I) 

1 

h„  v — , and  there  will  exist  a positive  integer  k such  that 


improved  in  general.  As  in  Theorem  4.2,  however,  if 


t lien 


0 2 

(5.11) 


(k) 


- x 


1*(  IT  ) 


(k)  V „ .el/  <10.  .-1 
y ” x ,Y«Y»[f'(x  )1 

* * 


4 \ 

’q' 

,(k) 


(k)  (k) 

for  x belonging  to  the  Newton  sequence  (x  ),  and  Moore's  theorem  with 


X"'*  - (x 

* * 

tect  x 

shown  in  Table  5.1  for  various  values  of 


(k) 


■h,.e, 


X 


(k)  a , , 

x ♦ 2 n.e] 

(k)  K * 

The  number  of  iterations  required  to  obtain  II x - x II  - e(-r)  is 

— 4 


will  de- 

1. 


0 


ho 

k 

0.41421 

1 

0.47648 

2 

0.49.197 

3 

0.49848 

4 

0.49962 

5 

0.49990 

6 

0.49997 

7 

0.49999 

8 

Table  5.1. 


Iterations  Required  to  Knsurc 
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6.  Computational  complexity.  The  computer  program  [9]  for  the  implementation  of 
the  Kantorovich  theorem  has  been  adapted  to  apply  Moore's  theorem  for  the  choices 
of  y,  Y,  and  X indicated  for  Corollary  4.1,  and,  as  an  additional  option,  for 
the  choice 

(6.1)  Y - [m(F' (X))]-1 


recommended  by  Moore  and  Jones  [13] . This  program  provides  an  experimental  as  well 
as  a theoretical  basis  for  the  comparison  of  the  computational  complexity  of  Theorem 
3.1  with  Theorem  3.2. 

To  make  effective  use  of  either  theorem,  software  is  needed  for  automatic  dif- 
ferentiation [7]  and  implementation  of  interval  computation  for  arithmetic  operations 
and  functions  ordinarily  encountered  in  FORTRAN  expressions  [17]  . In  addition,  the 
program  for  the  Kantorovich  theorem  requires  software  for  interval  matrix  inversion 
[5;  11,  pp.  32-39]  as  part  of  the  painstaking  calculations  required  to  guarantee 
that  upper  bounds  Bo'K'T1o  aro  obtained  for  the  quantities  indicated  in  (3.1),  (3.2), 
and  (3.4)  as  results  of  inexact  computations  with  machine  numbers. 

As  the  endpoints  of  intervals  employed  in  actual  calculations  must  be  machine 
numbers,  directed  rounding  is  used  after  each  operation  to  preserve  the  inclusion 
relationship.  For  this  and  other  reasons  [11]  , instead  of  the  exact  interval  exten- 
sion G of  a continuous  function  g , a computed  extension  G is  obtained,  which 
has  the  property  that  G(X)  ’ G(X)  for  all  X . Thus,  |g(X)  | _>  |g(X)  | and 
G(X)  i X implies  G (X)  i X , so  that  rigorous  conclusions  can  be  drawn  on  the  basis 
of  interval  calculations  with  computed  extensions.  In  what  follows,  it  will  be 
convenient  to  identify  a machine  (or  real)  number  z with  the  degenerate  interval 
z = [z,z] . 

The  calculation  of  BQ,n0  for  the  Kantorovich  theorem  are  fairly  straight- 
forward applications  of  interval  analysis.  The  I.ipschitz  constant  k,  however,  will 


be  obtain«>d  from  a bound  for  the  second  derivative 


f"(x) 


32fi(x) 
3x.  ;ix, 

j k 


which  will  be  called  the  Hessian  of  f . For  a real  bilinear  operator  p 
in  Rn  , 


(W 


n n 

(6.3)  llpll  - max  llpxll  £ max  }’  J | q | ■ p (p)  , 

11x11=1  (i)  j = l k=l  13 


with  a similar  expression  for  |p|  if  the  elements  of  p «re  intervals.  Thus, 
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a machine  number  k such  that 


n n 

(6.4)  k ^ u(F"  (X))  = max  j £ |f"  . (X)  | , 

(i)  j=l  k=l  13,1 

where  F"  is  the  computed  extension  of  f" , will  be  a Lipschitz  constant  for 
f ' on  X , as 

(6.5)  k max  II  f " (x)  II  . 

XfX 

The  essential  interval  operations  to  implement  the  Kantorovich  theorem  will 
now  be  listed,  it  being  assumed  that  x^  = [x^,x^]  is  an  exact  machim 
vector: 

Kl.  F(x^)  is  evaluated. 

K2.  The  interval  Jacobian  F'(x^)  is  calculated. 

K3.  Y o [F'(x^)l  ^ is  obtained  by  interval  matrix  inversion  of 

fV°>). 

K4.  B0il^0l  1 II  [f  • (xQ)  ] ^11  is  obtained  by  interval  arithmetic  using  (4.8). 

K5.  W = x^  - Y F(x^)  is  computed  using  interval  arithmetic,  so  that 

(1)  - ° 

x e W for  the  exact  Newton  point. 

K6.  nQ  21  |x(0)  - w|  >_  llx(1)  - x(0)||  is  obtained  from  (4.4)  by  .interval 
arithmetic . 

K7.  X o [x^  - 2ri^e,  x ^ + 2ri^e]  is  constructed,  using  interval  ar  tlmetic. 
K8.  F'(X)  is  evaluated  to  obtain 

K9 . F" (X)  by  differentiation. 

K10.  ic  >_  ii(F"(X))  > max  II  f " ( x ) II  is  obtained  from  (6.4)  by  intern  . > ithmetic. 

X(  X 

Kll.  h lB0kri(1l  calculated,  using  interval  arithmetic. 

After  the  machine  number  h has  been  obtained  in  this  way,  it  i.;  checked 
. 1 ° 

versus  the  machine  number  — to  see  if  (3.5)  holds  to  complete  the  au. om.  t ion  of 
Theorem  3.1. 

In  the  case  of  the  Moore  theorem,  interval  matrix  inversion  is  avoulo,  by 
evaluating  f'(x^)  approximately  as  a real  matrix,  say  in  double  pro  ; <n,  and 
then  inverting  the  result  carefully  to  obtain  a real  matrix  Y which  i ; inverse 

of  some  matrix.  (A  failure  of  matrix  inversion  here  or  in  K3  results  lu  .n  exit 
from  the  program  with  an  appropriate  error  indication.)  The  interval  conp'  .1  .it  ions 
required  for  Moore's  theorem  with  y = x ^ are: 


Ml. 

F (■  ' is  evaluated. 

M2. 

, - YF(y)  is  computed  using 

interval 

arithmetic;  one  has  w 

w 

M3. 

n _>  |w|  is  obtained  from  (4.4)  by 

interval 

ar  itlimet  ic . 
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M4 . X > (y  - 2ne,  y + 2nel  is  constructed  »iy  interval  arithmetic  . 

M5.  F'(X)  is  evaluated. 

M6.  ic  ( X)  ■ W + 2nd  - YF' (X)  } I-e,e)  is  constructed  by  interval  arithmetic. 
These  calculations  result  in  the  intervals  X - [a,b],  K(X)  « (c,d],  where 
the  components  of  the  vectors  a,b,c,d  are  all  machine  numbers.  The  verification 
of  K(X)  t x thus  depends  on  checkinq  the  2n  inequalities 

(6.6)  a . <_  c . , d . b . , i « 1 ,2 , . . . ,n , 
bc»tween  machine  numbers. 

Comparison  of  the  lists  of  interval  operations  for  the  two  theorems  reveals 

that  Ml  = Kl , M2  <_  K5  (Y  is  a real  matrix,  so  obtaining  YF'(y)  generally 

requires  fewer  operations  than  for  YQK' (x  ) ) , M3  = K6,  M4  «*  K7 , M5  * K8 . This 

commonality  of  subroutines  made  adaption  of  the  original  program  to  the  new  theorem 

very  easy.  This  leaves  K2 , K3,  K4,  KlO,  Kll  for  impltmentation  of  Theorem  3.1 

as  against  M6  and  the  inversion  of  a real  matrix  for  automation  of  Theorem  3.2. 

In  particular,  K3  (interval  matrix  inversion)  is  tedious,  and  K9  (Hessian  eval- 
1 2 

uation)  requires  — n (n+1)  differentiations  and  interval  evaluations,  while  M6 

is  a simple  interval  matrix-vector  multiplication. 

It  follows  that  the  application  of  Theorem  3.2  is  less  complex  by  far  than  the 

(0)  * 

use  of  Theorem  3.1.  At  least  for  good  approximations  y = x to  x , there  is 
no  significant  difference  in  precision  or  sensitivity  to  offset  this  advantage  of 
Moore's  theorem. 

A comment  is  in  order  on  the  choice  (6.1)  for  Y . For  X * X , 

1 p 

(6.7)  |m (F ' (X  ) ) - F ' (X  ) I = ± w(F'(X  ))  , 

p p 1 2 p 

and  from  (4.16)  and  (4.17), 

(6.8) 

Thus,  one  may  take  o>(p)  = Xp  in  Lemma  4.1,  which  gives  Theorem  4.1  with 


w(F ' (X  ) ) < X w(X  ) = 2Xp  . 

p - p 


(6.9) 


B > II  [m  (F  ' (X  ))]_1||  , n > M w-yll  = II  Yf  (y)  I 
— P ~ 


1 


Here,  however,  p is  chosen  first,  and  (4.21)  must  hold  if  h = BXq  < — . 

— 4 

Computationally,  this  method  differs  from  the  above  in  that  M^  and  M^  are 

eliminated,  M5  follows  the  choice  of  X , m(F'(X  ))  is  then  evaluated  and 

P P 

inverted  to  obtain  Y and  then  Ml,  M2,  and  M6  with  2r\  * p complete  the 
computation.  There  is  thus  no  essential  difference  in  complexity. 
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7.  A numerical  example.  An  actual  comparison  of  computational  efficiency  was 
obtained  by  applying  the  computer  program  [9],  modified  to  include  the  implementa- 
tion of  Moore's  theorem,  to  the  quadratic  system 

9 

(7.1)  x.  - ax. 

with 


j-1 


a . .x . 

13  3 


- 1 


i - 1,2, ... ,9, 


(7.2) 


■ij 


2 2 

t.  (1-t  ) w. 
_i J L 

t . + t . 
i 3 


where  t.,w.,  i = 1,2,..., 9,  are  respectively  the  nodes  and  weights  of  the 
Gaussian  integration  rule  of  order  17  on  the  interval  0 t <_  1 [10,  p.  288]  . 

The  system  (7.1)  was  constructed  as  a discrete  approximation  to  the  nonlinear 
integral  equation 


(7.3)  x ( s)  » 1 + ^ s x ( s ) /l  ■ x ( t ) dt , 0 < s < 1 , 

0 

which  is  a case  of  the  H-equation  of  Chandrasekhar  [1,  p.  105].  The  value  of 
a considered  was  a = 0.7,  and  the  initial  approximation  y = x^  = e gave 


(7.4)  X = [0. 3405052e,  1.6594949e]. 

(Directed  rounding  is  used  in  the  conversion  of  intervals  from  binary  to  decimal; 
hence,  the  decimal  interval  X given  by  (7.4)  contains  the  binary  interval  ston'd 
in  the  computer.) 

The  program  for  the  Kantorovich  theorem  computed 

(7.5)  h = 0.700800  > -5-  , 

^ **  * * * * 
and  thus  failed  to  detect  the  existence  of  a solution  x = (x^ ,x0 , . . . ,xg)  of 

(7.1)  in  X.  A total  of  21.4652  UNIVAC  1110  time  units  were  required,  of  which 

4.7436  were  expended  for  interval  matrix  inversion  (K3  and  K4) , and  12.1394  were 

required  for  the  evaluation  of  the  Hessian  (K9  and  KlO) . 

The  program  for  the  Moore  theorem , on  the  other  hand,  required  only  a total  of 

5.4120  time  units,  and  gave  K(X)  < X , thus  guaranteeing  the  existence  of  a solu- 
• _ _ _ _ 

tion  x . K(X)  . The  components  of  K(X)  = [c,d]  are  given  in  Table  7.1. 

In  tins  simple  example,  the  value  of  h has  not  been  overestimated  badly, 
as  the  Hessian  is  constant,  and  all  its  nonzero  elements  have  the  same  sign.  For 
bilinear  operators  Q of  this  type,  n <0)  = IIQII.  Thus,  the  program  based  on 
Moore's  theorem  provides  more  information  in  this  case,  as  well  as  showing  the 
expected  decrease  in  execution  time.  For  systems  with  nonconstant  Hessians,  the 
advantage  in  speed  of  Theorem  3.2  should  be  even  greater  than  the  4 to  1 advantage 
observed  here.  In  addition,  the  flexibility  of  being  able  to  use  .ntervals  other 
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i 

°i 

di 

1 

1.0042228 

1.0606792 

2 

1.0135268 

1.1943671 

3 

1.0223478 

1.3211143 

4 

1.0293528 

1.4217681 

5 

1.0344997 

1.4957230 

6 

1.0381216 

1.5477668 

7 

1.0405689 

1.5829316 

8 

1.0421079 

1.6050443 

9 

1.0429109 

1.6165838 

Tabic  7.1.  The  Interval  K (X)  ■ tc,dl. 


than  balls  and  the  componentwise  error  bounds  furnished  by  intervals  may  be 
of  significant  advantage  in  practical  computation. 

The  Author  wishes  to  acknowledge  the  assistance  of  Julia  H.  Gray, 

S.  T.  Jones,  and  C.  T.  Kelley  with  this  example,  which  arose  in  an  investi- 
gation completely  unrelated  to  the  subject  of  this  paper. 
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